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Abstract 


This  work  focuses  on  the  use  of  truncated  Gaussian  distributions  as  models  for  bounded  data  - 
measurements  that  are  constrained  to  appear  between  fixed  limits.  We  prove  that  the  truncated 
Gaussian  can  be  viewed  as  a  maximum  entropy  distribution  for  truncated  bounded  data,  when 
mean  and  covariance  are  given.  We  present  the  characteristic  function  for  the  truncated  Gaussian; 
from  this,  we  derive  algorithms  for  calculation  of  mean,  variance,  summation,  application  of  Bayes 
rule  and  filtering  with  truncated  Gaussians.  As  an  example  of  the  power  of  our  methods,  we 
describe  a  derivation  of  the  disparity  constraint  (used  in  computer  vision)  from  our  models.  Our 
approach  complements  results  in  Statistics,  but  our  proposal  is  not  only  to  use  the  truncated 
Gaussian  as  a  model  for  selected  data;  we  propose  to  model  measurements  as  fundamentally 
bounded  in  terms  of  truncated  Gaussians. 
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Introduction 


This  work  presents  a  new  class  of  statistical  models  that  are  well  suited  for  several  Robotics 
applications,  such  as  object  recognition  or  computer  vision.  Our  approach  deals  with  bounded 
data:  measurements  that  are  constrained  to  appear  in  a  bounded  region  in  the  measurement  space. 
Bounded  measurements  have  been  studied  in  Statistics  in  connection  with  selection  mechanisms; 
here  we  propose  a  different  approach,  in  which  the  data  is  considered  to  be  bounded  to  begin  with. 

To  date,  few  statistical  models  for  bounded  variables  are  available,  none  of  them  satisfactory. 
The  most  common  approach  is  to  use  the  Gaussian  distribution  and  model  bounds  through  an 
ad  hoc  selection  medianism  [2,  1].  Another  possibility  is  the  uniform  distribution  [8],  but  this 
approach  has  computational  problems:  summation  of  uniform  variables  does  not  yield  a  uniform 
variable  and  application  of  Bayes  rule  is  hard  [11].  In  short:  even  though  bounds  contain  a  lot  of 
information,  they  have  not  received  proper  attention  yet. 

Our  work  uses  a  class  of  distributions  in  the  truncated  Gaussian  family  in  order  to  model 
bounded  data.  We  derive  a  complete  set  of  tractable  algorithms  for  these  models:  calculation  of 
moments,  approximation  methods  for  Bayes  rule  and  summation  and  noise  filtering.  Overall,  our 
results  make  the  truncated  Gaussian  family  an  operational  tool,  much  more  powerful  than  the 
uniform  or  the  Gaussian  distributions. 

In  order  to  illustrate  the  strength  of  our  approach,  we  present  a  statistical  derivation  of  the 
disparity  constraint  used  in  Computer  Vision.  So  far  no  statistical  analysis  has  been  given  for  this 
constraint. 

Our  analysis  complements  results  scattered  in  the  literature  of  Statistics,  Information  Theory 
and  Control  Theory.  We  contribute  to  Robotics  by  indicating  a  proper  way  to  model  bounded 
measurements  and  deriving  tractable  algorithms  to  handle  them.  Besides  contributing  to  Robotics, 
our  algorithms  demonstrate  that  Robotics  has  much  to  contribute  to  Statistics  itself. 


2  The  Truncated  Gaussian  Family 


Our  basic  model  is  the  elliptically  truncated  Gaussian  family.  A  distribution  in  this  family  is 
proportional  to  a  Gaussian  inside  an  ellipsoid  and  is  zero  outside  the  ellipsoid.  The  truncated 
Gaussian  model  has  been  proposed  in  a  variety  of  contexts  in  Statistics  [9]  as  models  of  selection 
mechanisms.  In  Robotics,  the  first  explicit  mention  of  the  possibility  of  using  the  truncated 
Gaussian  appears  to  be  by  Erdmann  [5]. 

A  truncated  Gaussian  distribution  for  a  n-dimensional  random  vector  x  is  referred  to  as 
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Nt,^M,k{fJ-,Py,  its  mathematical  expression  is: 


(1) 


where  /(•)  is  the  indicator  function  and  q(n,P,i',M,k)  is  a  normalizing  constant.  The  set  {a:  : 
(x  _  u)'^M~^{x  —  v)  <  k}  defines  an  ellipsoid  in  n-dimensional  space.  Call  k  the  radius  of  the 
distribution. 


As  a  special  case,  note  that  if  fx  =  i>  and  M  =  then  the  normalizing  constant  depends  only 

on  k:  ^ 

q(f^,P,t‘,P,k)  =  Pr(xl<k)=  I 

*/0 

In  this  case,  q{n,  P,  //,  P,  k)  is  the  value  (at  k)  of  the  distribution  function  of  a  chi-square  variable 
with  n  degrees  of  freedom.  Due  to  the  importance  of  this  sub-family  for  modeling  purposes,  we 
call  it  the  radially  truncated  Gaussian  family. 


2.1  Genesis  of  a  Truncated  Gaussian 

There  are  other  situations  where  a  truncated  model  is  appropriate  because  data  is  purposefully 
truncated.  Consider  an  n-dimensional  source  of  (unbounded)  Gaussian  noise  and  an  ellipsoid  in 
this  n-dimensional  space.  If  any  measurement  outside  the  ellipsoid  is  discarded,  then  the  resulting 
data  obeys  a  truncated  Gaussian.  Examples  of  these  procedures  are  the  algorithms  of  Cox  [2] 
and  Bar-Shalom/Fortmann  [1].  These  algorithms  use  the  Gaussian  distribution  associated  with 
ad  hoc  selection  mechanisms;  none  of  these  algorithms  uses  appropriate  models  for  bounded  data. 
This  approach  to  bounded  data  is  employed  in  Statistics  in  order  to  model  selection  mechanisms 
[9].  Results  derived  in  this  work  can  be  understood  as  new  tools  for  this  type  of  analysis. 

There  is  a  different  way  of  looking  at  bounded  data.  We  can  use  bounded  models  in  order  to 
model  data  that  is  fundamentally  bounded,  not  bounded  as  the  result  of  a  selection.  We  elaborate 
on  that.  Suppose  we  know  a  random  vector  x  has  mean  //,  covariance  matrix  Q,  and  Pr{x)  =  0 
for  all  X  outside  {x  :  {x  —  hYQ~^{x  —  fx)  <  k}.  In  other  words,  possible  values  of  x  concentrate 
around  the  mean  in  a  symmetric  fashion,  up  to  the  distance  k  in  the  metric  induced  by  Q.  Under 
these  conditions,  we  have  (proof  in  Appendix  A): 

Theorem  1  Given  a  expected  value  is  p,  a  covariance  matrix  Q  and  the  fact  that  a  distribution 
is  zero  outside  the  set  {x  :  (re  —  p)'^Q~^{x  -  p)  <  k},  a  maximum  entropy  distribution  that  obeys 
these  conditions  is  a  truncated  Gaussian  N^^cQ,k'{ti’jcQ))  where  c  =  Pr{xn  —  ^){P'^iXn+2  —  ^))  ^ 
and  k'  =  kPr{xl^^  <  k){Pr{xl  < 

This  theorem  strengthens  the  parallel  between  truncated  and  unbounded  Gaussians. 
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Figure  1:  Distributions  in  the  truncated  Gaussian  family:  A^o,i,o.oi(0, 1)  (left)  and  A^i.5,i,i(0, 1) 
(right). 

The  truncated  Gaussian  family  is  even  more  powerful  than  the  unbounded  Gaussian.  The 
truncated  Gaussian  family  can  represent  highly  skewed  distributions  and  distributions  that  are 
nearly  constant.  Figure  1  illustrates  this  claim. 


2.2  Numerical  Evaluation  of  g(/x,  P,  z/,  M,  k) 

The  central  problem  in  the  characterization  of  a  truncated  Gaussian  is  the  determination  of 
q{ljL,P,i/,M,k).  We  introduce  an  important  linear  transformation  that  will  be  used  throughout 
the  paper. 

It  is  always  possible  to  transform  the  original  expression  of  a  truncated  Gaussian  into  the 
following: 

[2w)2  \  ^  / 

where  P  is  a  diagonal  matrix.  The  result  is  obtained  through  a  linear  transformation,  named 
double  diagonalization: 

z  =  $^\/A  V'^{x  —  fi) 

where:  A  is  the  eigenvalue  matrix  of  P,  V  is  the  corresponding  eigenvector  matrix  of  P  and  $  is 
the  eigenvector  matrix  of  {y/X  V'^)M{Vy/K  ).  The  transformation  is  always  possible  since  P 

and  M  are  positive  definite.  The  vector  w  is  j^$^\/A  V^()U  —  t')! . 

We  work  with  the  transformed  variable  z,  since  q{u:,D,k)  =  q{fi,P,i/,M,k).  Then  (d,-  the 
inverse  of  the  ith  element  in  the  diagonal  of  D): 

g{u,,D,k)  =  Pr  -0,0“  <  fc)  = 


Kotz,  Johnson  and  Boyd  [9]  give  a  numerical  method  for  the  evaluation  of  this  integral  based 


on  its  Laguerre  expansion.  We  have: 


/  r.  fc  o°  (j  —  iV 

q{uj,D,k)  =  Pr{xl  <kl^)+  j  e  '^^CjLj.i{k/{2^)) ^ 

where: 

,  ,  ,  (-y)''  r(r  +  n/2  +  l) 

;^i!(r-i)!r(i  +  n/2  +  l) 

r— 1 

Cr  =  (2r)  ^  Sr-jCj  ,  Co  =  1 

i=0 

j—1  j=l 

Convergence  is  uniform  for  /?  >  in  general,  large  values  of  /3  yield  slow  convergence.  If 

we  truncate  the  evaluation  of  the  series  at  j  =  N,  the  truncation  error  is  always  smaller  than: 

2“^  exp  (kf{2l3)  +  (n  —  1)  log  2  +  4u;'^ojj  . 

As  a  one-dimensional  example,  consider  Afi.5,i,i(0, 1)  (shown  in  figure  1).  Numerical  integration 
with  10  significant  digits  yields  §(0, 1, 1.5, 1, 1)  =  0.3023278734.  Using  ^  =  1.1,  A’  =  9,  we  get  the 
same  answer. 


2.3  Moment  Generating  Function  of  a  Truncated  Gaussian 

The  moment  generating  function  of  any  distribution  is  a  fundamental  tool  in  statistical  analysis. 
We  give  here  an  expression  for  the  moment  generating  function  of  a  truncated  Gaussian  in  the 
most  general  form  (proof  in  Appendix  A): 


Theorem  2  The  moment  generating  function  of  a  truncated  Gaussian  Nu,D,ki^,  I)  is: 

q{aj,  D,k)  ^  V  2  y  ■ 


(2) 


For  a  particular  value  of  t,  we  can  evaluate  <j>{t)  using  Kotz,  Johnson  and  Boyd  recursions  for 
q{u>  —  t,  D,  k)  and  then  plugging  the  result  in  (^(cu,  D,  k))~^q{aj  —  t,  D,  k)  exp(t’^t/2). 
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2.4  Mean  Vector  and  Covariance  Matrix  of  a  Truncated  Gaussian 


The  mean  and  covariance  can  be  obtained  by  successive  differentiations  of  ^{t)  [10].  Since  Kotz, 
Johnson  and  Boyd  recursions  for  <f>{t)  are  uniformly  convergent  we  can  differentiate  these  expres¬ 
sions  term  by  term.  We  use  Ji  for  the  mean  and  P  for  the  covariance  of  a  truncated  Gaussian. 

Direct  differentiation  of  2  yields: 

Theorem  3  We  have: 

CO 

A*  —  ^j9j 

j=i 

oo 

P  =  I  +  koY^kjGj, 
i=i 

where  (kr  are  scalars,  Qr  and  hr  are  vectors,  Gr  and  Hr  are  matrices  and  I  is  an  identity  matrix): 

_  u  -  ly-Lj-imm)  .  _  AA V'"  _£*_ 

-  r(n/2  +  j)  ’  W)  q{w,D,k) 

r— 1 

9r  —  (2r)  (^hr-jCj  Sr-j9j)  1  9o  —  ^ 
j=0 

hr  =  ^  diag  [di(l  - 

r— 1 

Gr  =  (2r)  ^  ^2  {^jHr-j  +  9jhJ-j  +  hr-jgj  +  Sr-jGjJ  ,  Go  =  0 
j=0 

Hr  =  -jdmg[d^{l-di/PY-\...,dn{l-dJl3y-^] 


For  a  distribution  P)  in  the  radially  truncated  Gaussian  family,  the  mean  is  y  and 

the  covariance  matrix  is  Pr{Xn+2  ^  ^  [9]- 


2.5  Linear  Transformation  and  Summation  of  Truncated  Gaussians 


A  non-singular  linear  transformation  applied  to  random  vector  with  truncated  Gaussian  distribu¬ 
tion  produces  another  truncated  Gaussian  random  vector  (proofs  of  theorems  in  this  section  are 
in  Appendix  A). 
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Theorem  4  //'a:  ~  Nu,M,kifi,P)  y  =  Ax  (where  A  is  any  non-singular  square  matrix)  then 


It  can  be  seen,  through  direct  manipulation  of  truncated  Gaussians  even  in  one  dimension, 
that  the  sum  of  truncated  Gaussian  random  variables  is  not  a  truncated  Gaussian.  We  can 
derive  some  results  for  particular  cases.  Suppose  z  =  x  y  where  x  ~  Ni,^^Mj:,kAy‘x,Px)  and 
y  ^  Nuy,My,ky{ti'y,Py),  ^  and  y  independent.  Under  these  conditions: 

Theorem  5  The  distribution  of  z  has  expected  value  =Ji^-\-Jiy  and  covariance  =  Px  +  Py,- 
the  distribution  of  z  is  positive  only  inside  the  ellipsoid  defined  by  {z  :  (z  —  Uz)'^M~^{z  —  vf)  <  1} 
where 

Vz  ~  *1“  ^y  ~  hx-^x  “b  ^y^^y 

A  reasonable  approximation  to  the  distribution  of  2  is  N^^^MzA^PziP z)- 

A  special  but  important  case  is  represented  by  Vx  =  Px-,  Mx  -  Px,  t^y  =  Py,  My  =  Py,  kx  =  ky. 
In  this  special  case  we  can  make  statements  about  the  distance  between  the  approximation  and 
the  correct  distribution.  Call  Gz  the  correct  distribution  for  z;  then: 

Theorem  6  For  pz  —  px  Py>  Mz  =  Mx  +  My,  kz  —  kx  =  ky,  sup^  \Gz  —  Nuz,Mz,kz{^ziMz)\  is 
0{exp{-kl2)). 

So  for  this  case,  the  indicated  approximation  is  fairly  good. 

Consider  now  the  most  general  case:  z  =  x  +  y,  where  x  and  y  are  arbitrary  truncated 
Gaussians.  We  shall  indicate  approximation  strategies  that  work  well  under  specific  conditions. 
We  consider  the  vector  u,  a  linear  transformation  of  z  so  that  defining  ellipsoid  is  {u  :  u^u  <  1}. 
Call  pu  the  mean  of  u  and  Pu  the  covariance  matrix  of  u.  The  distribution  of  u  is  obtained  by 
convolution,  so  it  is  unimodal  and  closer  to  a  bell-shaped  function  than  the  original  distributions. 
We  may  expect  that  the  distribution  of  u  to  be  closer  to  a  Gaussian  than  the  distribution  of  x  or 
y,  invoking  the  Central  Limit  Theorem. 

Consider  the  approximation  pi(u)  =  Noja{Pu-,  Pu)-  K  all  eigenvalues  of  Pu  are  much  smaller 
than  1,  then  this  approximation  is  reasonable  because  the  mean  will  be  approximately  pu  and 
the  variance  will  be  approximately  Pu-  The  distribution  will  always  have  a  maximum  inside  the 
defining  ellipsoid,  so  it  will  always  yield  a  smooth  approximation  to  pu{u).  An  example  will 
illustrate  this.  Consider  distribution  A^i.5,i,i(0, 1),  depicted  in  figure  1.  Mean  is  1.107  and  variance 
is  0.21288.  If  we  add  two  random  variables  with  this  same  distribution,  we  obtain  (numerically) 
the  distribution  shown  in  figure  2. a.  Mean  is  2.213  and  variance  is  0.4257.  The  approximation 
5, i(2.213, 0.4257)  is  shown  in  figure  2.b.  Both  distributions  are  shown  in  figure  2.c.  Agreement 
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Figure  2;  Example:  Approximation  to  Summation 


Figure  3:  Example:  Failure  of  Approximation  to  Summation 


is  remarkable  even  in  this  highly  skewed  distribution;  the  strategy  will  work  better  if  the  means 
of  the  original  summands  are  closer  to  zero. 

We  may  take  the  approximation  as  valid  for  large  variances,  but  there  is  a  limit  to  such  process. 
We  cannot  use  this  strategy  for  distributions  with  arbitrarily  large  variances.  Again,  we  use  an 
example  to  illustrate  this  situation.  Consider  the  distribution  A^o,i,i(0, 50),  depicted  at  figure  3. a. 
Consider  the  summation  of  two  random  variables  with  this  same  distribution.  Figure  3.b  shows 
both  the  correct  distribution  and  the  approximation  Ao, 1,4(0, 100).  The  disagreement  is  evident. 
So  we  need  a  specific  approximation  for  the  case  of  large  variances. 

A  different  strategy  would  be  to  approximate  the  distribution  of  it  by  a  truncated  Gaussian  that 
matches  the  mean  and  variance  of  it.  This  approach  is  explored  in  Appendix  B;  approximation 
derived  there  are  valid  under  restrictive  conditions  on  the  second  moment  of  the  distribution  of 
It.  Derivation  of  good  approximations  for  all  cases  is  still  an  open  problem. 

3  Inferences  with  the  Truncated  Gaussian 

Inferences  about  a  random  variable  are  obtained  by  application  of  Bayes  rule  associated  with  a 
decision  rule.  We  analyze  two  decision  rules:  maximum  a  posteriori  estimate  and  minimum  square 
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loss  estimate. 

Since  the  truncated  Gaussian  is  not  closed  under  multiplication,  we  should  not  expect  to  be 
able  to  apply  Bayes  rule  and  obtain  a  truncated  Gaussian  distribution.  The  next  example  reveals 
the  problems  that  may  arise  here. 


Example  1  Suppose  x  and  u  are  related  by  z  =  x  +  oj,  p{x)  is  A^o,(i+£)2,3(0,  (1  +  e)^)  and  p{z\x) 
is  iVj.  (i+e)2  3(0:,  (1  +  e)^).  Suppose  the  observation  comes  out  to  be  z  —  2\/3.  Application  of  Bayes 
rule  for  p{x\z)  yields  a  distribution  concentrated  between  [\/3  ~  +  e],  tending  to  a  spike  as  e 

goes  to  zero.  □ 


Fortunately,  we  can  find  a  good  approximation  methodology  for  a  situation  relevant  to  practical 
applications.  We  consider  a  linear  set-up: 


2  =  Hx  -b  uJz 


(3) 


where  x  €  and  z  A  and  H  are  matrices  of  appropriate  dimensions,  x  is  distributed  as  a 

truncated  Gaussian  Ni/x,Mx,kx{k‘'xj  Px)]  is  distributed  as  a  radially  truncated  Gaussian  with  zero 
mean  A^o,p^,fc.(0,  P^).  As  a  result,  2  ~  NHx,Px,kx{Hx,Pz). 

The  central  question  is:  given  that  we  observe  the  value  of  z,  what  can  we  say  about  x? 


The  posterior  is  not  a  truncated  Gaussian  because  the  region  where  the  distribution  is  positive 
is  not  an  ellipsoid  (it  is  the  intersection  of  two  ellipsoids).  The  distribution  still  is  proportional  to 
a  Gaussian  in  the  points  where  it  is  positive,  as  we  show  now.  Consider  two  distributions: 

qiexp  ^“2^^  ~  l^xY  Px^{x  —  Ia{x)  92exp  ~  Pz  “  Hx'^  Ib{x)i 

where  A  and  B  are  arbitrary  convex  sets,  not  necessarily  ellipsoids.  If  we  multiply  these  distribu¬ 
tions  together,  the  result  is  positive  only  on  the  set  A  D  P;  on  this  set  the  result  is  proportional 

W  =  exp  ^-^(x  -  pLx^fp-\x  -  px)  +  (2  -  Hxfp-\z  -  Hx)^  . 

By  using  the  Matrix  Inversion  Lemma  [1],  we  get  the  following  standard  result  of  linear  systems 
theory: 

W  a  exp  -  M')j 

where  Q  =  Px  -  PxH^{HPxH^  +  Pz)~'^HPx  and  p'  =  Q{P-^Px  +  This  shows  that  the 

shape  of  the  distribution  is  proportional  to  a  Gaussian.  So  the  final  distribution  will  be: 


c 

I27rdet((5)|"'/^  ^ 


(-^{x-pYQ  '^{x- p')^  lAnsix), 


(4) 
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where  c  is  a  normalizing  constant. 

So  we  can  easily  recover  the  shape  of  the  distribution;  the  normalizing  constant  c  still  is 
unknown  and  depends  on  AOB.  The  main  difficulty  is  to  find  a  good  and  tractable  approximation 
to  this  region.  This  is  presented  in  the  next  section. 


3.1  Intersecting  Ellipsoids 

In  order  to  approximate  the  posterior,  an  ellipsoidal  approximation  can  be  built  so  that  the 
approximate  posterior  is  always  a  truncated  Gaussian.  This  method  was  originally  proposed  by 
Fogel  and  Huang  [7]  in  the  context  of  tolerance  sets  for  ARMA  processes.  Since  the  algorithm 
approximates  only  the  geometric  intersection  of  ellipsoids,  the  values  of  fj,  and  P  do  not  matter. 
We  indicate  truncated  ellipsoids  by  N{v,  M,  k)  in  this  section. 


3.1.1  Preliminary  Approximation 
We  make  some  preliminary  approximations: 

•  Consider  expression  (3).  Suppose  ~  Af(0,  W,  1/a)  and  x  ~  A’(//,  P,  h). 

•  Now  take  a  linear  transformation  A  such  that  AWA^  =  I  (always  possible  since  W  must 
be  positive  definite).  So  we  have  Az  =  AHx  +  Now  consider  B  —  AH,  y  =  Az  and 
ijy  =  Auy]  we  have: 

y  -  Bx  +  ujy 

and  Uy  ~  N{0, 1, 1/a). 

•  We  know  that  aJ^Uy  <  1;  instead  of  trying  to  satisfy  this  inequality,  we  will  satisfy  the  set 
of  inequalities: 

aw^j  <  1  for  i  =  1, . . . ,  m. 

The  meaning  of  this  approximation  is  the  following:  the  original  constraint  defines  a  spheroid 
in  3?”^;  the  new  constraint  defines  the  m-dimensional  cube  that  circumscribes  the  spheroid. 

•  By  rearranging  terms  we  have: 

a{yi  —  BixY  <\ior  i  =  .  ,m.  (5) 

where  yi  is  the  ith  element  of  y  and  Bi  is  the  ith  row  of  B. 

Now  the  problem  is  how  to  approximate  the  following  set: 

0m  =  :  [(a:  -  fi)'^P~^{x  -  /i)  <  ft]  D  [Pl™ a(j/,-  -  Bixf  <  l] }  • 
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3.1.2  Fogel/Huang  Algorithm 


Consider  the  following  definitions: 

Po  =  P 

$0  =  n 

bo  =  b 

Go  =  (a:  :  [(x  -  60)'^ P'^ix  -  9o)  <  bo] } 

Si  =  {x  :  a{yi  -  Bixf  <1}  for  i  =  1, . . . ,  m. 

Each  set  Si  defines  a  strip  (a  region  in  between  two  hyperplanes). 

The  Fogel/Huang  strategy  is  to  start  with  Go  and  intersect  this  ellipsoid  with  one  strip  at  a 
time,  approximating  the  intermediate  intersections  by  ellipsoids:  We  could  generate  ©„  by  the 
recursion: 

Gt+i  =  Si^\  n  0j. 


•  We  approximate  0i+i  as: 

0.+1  =  {a;  :  [(x  -  pLfP~^{x  -  fi)  +  aqi+i{yi+i  -  Bi+ixf  <  {bi  +  9,+i)] }  (6) 

where  qi+i  is  a  free  parameter  to  be  determined.  0i+i  contains  the  intersection  between  0j 
and  the  strip  Si+i. 

•  We  can  transform  expression  (6)  into  an  explicit  ellipsoid  expression^ 

0.+1  =  {x  :  (x  -  ei+ifPr^\{x  -  Oi+i)  <  7i+i}  (7) 


where: 


Pi+\  =  Pi  ^  +  aqi+iBjBi 


Oi+i  =  Pi+i{Pi  ^Oi  +  aqi+iBfyi+i) 


P.  _  p  _  PBjBjPi 

P.+i  P,  + 

(8) 

.  ,  (2/m-W 

6,+.  +  ?.«  - 


•  We  must  determine  qi+i  by  some  convention.  Fogel  and  Huang  prove  that  it  is  possible  to 
choose  qi+i  so  that  the  square  of  the  volume  of  the  ellipsoid  0,+i  is  minimized.  By  doing 
so,  we  obtain  (proof  of  this  claim  is  in  Appendix  A): 

{0  if  Oj  —  SoiOfa  <  0  or  —02  +  yjoi^  —  ^ocia^  <  0 

-a2+y/Qi-4aia3  , 

- -  otherwise. 

^Using  the  following  standard  result:  (x  —  a)'^ A{x  —  a)  +  (x  —  b)'^ B{x  —  6)  =  (x  —  c)^C'(x  —  c)  where  C  =  A  +  B 
and  c  =  C~^{Aa  +  Bh),  and  the  Matrix  Inversion  Lemma  [1]. 


10 


where: 


ai  =  a\n  - 

02  =  {Bi+iPiBi+i(2an  —  a  +  a^biBi^iPiBi^i  +  a^{yi  —  BiOiY  (9) 

0:3  =  n(l  -  a{yi  -  BiOif)  - 


Summary  of  Fogel/Huang  Algorithm  The  algorithm  is  a  sequence  of  m  steps.  At  each  step: 

1.  Get  from  the  results  of  the  previous  step  using  expressions  (9). 

2.  If  =  0,  the  measurement  did  not  cause  any  change  in  the  ellipsoids.  Updating  is  the 
ellipsoid  is  not  necessary. 

3.  Update  ©i+i  using  expressions  (7)  and  (8). 

4.  If  6, -4.1  <  0,  the  likelihood  and  the  prior  are  in  conflict:  the  sets  have  empty  intersection. 
This  case  is  discussed  below. 

We  take  the  defining  ellipsoid  for  the  posterior  distribution  to  be  {x  :  {x  —  9mYP~'^{x  —  6m)  <  6m}- 
This  is  equivalent  to  the  set  |x  :  (x  —  fi)^P~^{x  —  fi)  +  HiLi  a9t(j/t  —  Bixy  <  6  +  9i}-  Our 

approximation  is  complete. 


3.2  Updated  Posterior 


The  posterior  distribution  is  a  combined  result  of  expression  (4)  and  Fogel/Huang  algorithm;  we 
use  Ne^^p^^b^{n',Q)  as  the  posterior.  By  looking  at  the  recursions  in  expressions  (7)  and  (8), 
we  notice  that,  if  qi+ia  =  1  for  every  then  the  final  approximation  is  in  the  radially  truncated 
Gaussian  family. 

Some  examples  clarify  the  use  of  our  algorithm. 


Example  2  Consider  the  situation: 


z  = 


4  1 
2  3 


X  +  a; 


X  ~  Na,B,i/4{a,  B) 


where  x  is  unknown  and: 


a  = 


1 

1 


B  = 


2  0 
0  2 


2  = 


2 

3 
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Figure  4. a  shows  the  initial  ellipsoids  of  the  problem.  The  region  of  possible  values  of  x  is  the 
intersection  of  ellipsoids.  Figure  4-b  shows  the  strips  generated  at  expression  (5);  the  intersection 
of  these  strips  with  the  prior  ellipsoid  will  define  the  posterior  approximation. 

The  algorithm  is  applied  and  results  are  shown  in  figure  4-c.  The  largest  hatched  ellipse  is  the 
result  of  incorporating  =  2;  the  smallest  hatched  ellipse  is  the  final  result  after  incorporation  of 
Z2  =  3.  The  approximate  posterior  is  Nc,D,e{fiG)  where  e  —  0.257  and: 


■  0.363  ■ 
0.547 

D  = 

0.4488 

-0.3272 

-0.3272  ■ 
0.7555 

/  = 

■  0.321 
0.7418 

G  = 

0.091 

-0.0867 

-0.0867  ■ 
1.8857 

The  approximations  found  by  the  algorithm  are  correct  in  that  the  intersections  are  always 
interior  to  the  successive  approximations.  It  should  be  noted  that  the  original  prior  distribution 
for  X  is  almost  flat;  it  is  possible,  using  our  approach,  to  approximate  situations  where  the  strict 
Gaussian  model  would  be  brittle. 

As  a  comparison,  we  analyze  what  would  have  happened  if  we  had  used  an  approximation 
strategy  in  the  spirit  of  [1]  or  [2].  We  would  pretend  x  and  u  to  be  distributed  as  unbounded 
Gaussians  (with  same  mean  and  variance).  Now  we  find  a  crucial  problem:  how  to  combine  the 
fact  that  X  has  an  elliptic  region  with  radius  1/4  and  uj  has  an  elliptic  region  with  radius  1? 
By  using  Bayes  rule  in  the  unbounded  Gaussians  N{0,I)  and  N{a,B),  we  obtain  an  unbounded 
Gaussian  N(f,  G).  The  hatched  ellipsis  in  figure  4.d  shows  the  results  of  using  these  values  of 
mean  and  variance  with  radius  1/4,  5/8  and  1.  All  ellipsis  fail  to  cover  the  region  where  x  is  known 
to  lie.  Radius  1  /4  is  extreme:  almost  all  values  of  x  inferred  from  this  posterior  are  inconsistent 
with  prior  expectations! 


Example  3  Consider  the  situation: 

x+u  w  ~  Ao, 7,9(0, 1)  X  Ao,/,4(0,  /), 

z  =  [3, 7]^  and  x  is  unknown. 

Figure  5. a  shows  the  ellipsis  defined  by  prior  expectations  and  measurements.  It  is  clear  that 
the  measurement  is  inconsistent,  indicating  either  the  presence  of  outliers  or  mismodeling.  Incor¬ 
poration  of  Zi  yields  the  hatched  ellipse  in  the  figure;  yi  is  consistent.  When  22  is  incorporated, 
we  obtain  62  =  —11.49,  indicating  failure.  □ 


1  0 
0  1 


Notice  that  detection  of  inconsistency  is  important  and  not  trivial  when  using  Gaussian  dis¬ 
tributions.  If  we  were  to  suppose  that  x  ~  A(0, /)  and  uj  ~  A(0, /),  then  the  posterior  would 

/  r  -IP  *1  \ 


be  N 


1.5 

3.5 


0.5 

0 


0 

0.5 


\ .  Again,  it  is  not  clear  how  to  choose  the  radius  for  the  truncated 


posterior.  Figure  5.b  shows  the  resulting  ellipsis  for  radius  4,  6.5  and  9;  none  of  them  reflect  the 
true  state  of  affairs. 
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(a)  Initial  Ellipsoids 


Figure  4:  Example  of  Inference  Algorithm  for  Bounded  Distributions 
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(a)  Bounded  Approximation  (b)  Unbounded  Approximation 


Figure  5:  Example  of  Inference  Algorithm  for  Bounded  Distributions  in  Case  of  No  Intersection 

3.3  Inferences  based  on  Maximum  a  Posteriori 

A  common  estimate  is  the  one  obtained  by  maximizing  the  posterior  density  p{x\z).  Consider 
Nu,  M,k{Pi  P)-  We  take  the  estimate  x  =  arg  maXj,A^^,A/,fc(/^)  P)-  The  optimization  problem  is 
simplified  if  we  perform  a  double  diagonalization.  We  look  for: 

w  =  arg 


There  are  two  possible  cases: 


•  If  <  k  then  0  is  inside  the  defining  ellipsoid.  Since  the  Gaussian  is  unimodal,  the 

maximum  a  posteriori  is  tt)  =  0  (it  must  be  transformed  back  to  x). 

•  Otherwise,  the  maximum  occurs  on  the  boundary  of  the  defining  ellipsoid.  So  the  optimiza¬ 
tion  problem  is: 

w  =  arg  min{„,(^_^)Tp-i(,„_^)=fc}U;^u; 

Defining  a  Lagrange  multiplier  A,  we  obtain  the  Lagrangian: 

w  =  X  [(lo  —  —  0  “  ^]  + 

Differentiation  of  the  Lagrangian  with  respect  to  w  and  A  produces  a  set  of  equations: 

Wi  -f  A — -  =  0  for  Z  =  1,  .  .  . ,  n  and  2^ - :: -  =  k. 


D, 


i=i 
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If  we  isolate  Wi  in  each  one  of  the  linear  equations  we  obtain  Wi  =  Combining  all 

equations: 

«  /  >  \  2 
St 


t=i 


1  ^  "b  . 


=  k. 


(10) 


We  must  solve  this  equation  for  A.  Consider  the  function  £f(A)  =  J2i=i  Da  •  dW 

is  always  positive  and  goes  to  zero  if  A  goes  either  to  infinity  or  minus  infinity.  g{X)  has 
singularities  at  -Du  for  all  at  these  points  it  goes  to  infinity.  Suppose  we  order  the  values 
of  -Du  so  that  we  have  Di  <  D2  <  .  •  -  <  -Dn  <  0.  We  know  that  the  solution  for  equation 
(10)  that  corresponds  to  a  minimum  is  a  positive  A,  since  this  is  the  only  way  we  can  have 
Wi  <  (i  for  all  i.  So  the  solution  must  lie  in  the  interval  (D„,oo).  So  the  constrained 
minimization  is  solved  by  performing  a  double  diagonalization,  sorting  the  values  of  Du  and 
searching  for  the  solution  of  equation  (10)  in  the  interval  (Z)„,oo).  Since  the  function  g{X) 
is  decreasing  in  this  interval,  Newton’s  method  will  work  if  started  with  any  value  e,  larger 
than  Dn  but  smaller  than  the  solution. 


3.4  Inferences  based  on  Square  Loss 

Another  common  estimate  is  obtained  by  minimizing  expected  square  loss:  L{x,  x)  =  {x-  x)^(x  - 
x).  The  optimal  estimate  is  the  conditional  mean  x  =  Efxjz]  [4].  Since  we  have  the  posterior,  we 
can  evaluate  the  mean  by  the  methods  previously  presented. 


4  Filtering  with  the  Truncated  Gaussian 


Using  all  results  so  far  presented,  we  can  derive  a  filtering  scheme  akin  to  the  Kalman  filter  but 
using  truncated  Gaussians. 


Consider  first  the  simple  linear  system: 

Xi^l  —  Axi 

Zi+i  =  Bxi+1  +  Wi. 


(11) 


We  assume  Xq  distributed  as  a  truncated  Gaussian  and  w  distributed  as  zero  mean  radially  trun¬ 
cated  Gaussian.  A  and  B  are  matrices  of  appropriate  dimensions. 


Given  this  set-up,  we  are  interested  in  obtaining  p{xi+i\zi, . . . ,  ^i-i-i): 

p(^i+l  l^^l,  •  .  .  5  ■2^»4-i)  OC  p(zi.(-l  .  .  .  ,  2:,')  X  p(Xj.).l  |zi, .  .  .  ,  Zj) 

=  p(z,+i|x,+i)  X  p(x,+i|zi, .  .  .  ,  Zi) 
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The  distribution  p{zi^i\xiJi^i)  is  directly  obtained  from  the  distribution  of  uji.  We  need  to  find  the 
distribution  p(a;,+i j^i, . . . ,  z,-)  =  •  • » ^«)*  Using  theorem  4  we  can  obtain  p{Axi\zi, . . . ,  2,) 

from  the  available  p{xi\zi, . . .  ,2,).  So  the  filtering  scheme  has  the  following  structure: 

1.  Propagate  forward  the  uncertainty  in  a;,-. 

2.  Use  expression  (4)  and  Fogel/Huang  algorithm  in  order  to  fuse  the  incoming  information 
2, -4.1  with  Xj. 


If  necessary,  the  best  estimate  of  x  can  be  obtained  at  any  time. 

Extension  of  the  filter  above  to  a  noisy  state  problem  is  straighforward.  Consider  the  system 
model: 


x.'+i  =  Axi  + 

2,+i  =  Bxi+iAoJzr 


(12) 


In  this  case  we  must  obtain  p(x,+i|2i, . . .  ,2,-)  by  calculating  p(Ax,|2i, . . . , 2,)  and  combining  it 
with  p{ijJzi)-  Having  done  this,  we  can  propagate  forward  the  uncertainty  in  x,-  (by  properly 
approximating  the  summation  of  Axi  and  and  fuse  it  with  the  incoming  information  (through 
the  Fogel/Huang  algorithm). 


5  Using  the  truncated  Gaussian;  Disparity  Constraint 


In  this  section  we  analyze  a  practical  problem  in  Computer  Vision  using  truncated  Gaussians. 
The  example  will  illustrate  the  power  of  the  method. 

Consider  two  cameras  perfectly  aligned  so  that  all  epipolar  lines  are  parallel.  Suppose  a  feature 
is  detected  in  one  camera  at  pixel  X2.  The  correspondence  to  the  feature  must  lie  in  the  epipolar 
line  in  the  other  camera  in  some  pixel  Xi.  The  question  is:  how  much  of  the  epipolar  line  should 
we  explore  in  order  to  find  the  correspondence  xi? 

From  the  basic  optics  of  the  problem  we  can  derive  the  following  equation  [6]: 

Xi  =  X2  +  — ,  (13) 

2 

where  X2  is  the  known  coordinate  of  the  correspondence  (in  pixels),  B  is  the  baseline  distance  (in 
meters),  /  is  the  focal  length  (in  pixels)  and  2  is  the  depth  of  the  feature  (in  meters).  Only  2  is 
Unknown;  as  an  example,  we  take  X2  =  0,  /  =  600  and  B  =  0.5. 
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A  powerful  constraint  that  can  be  used  is  the  fact  that  the  point  Xi  cannot  be  arbitrarily  far 
from  the  given  value  of  X2.  This  constraint  has  not  been  justified  or  analyzed  with  the  help  of  a 
statistical  model. 


The  model  (13)  does  not  take  into  account  the  fact  that  every  camera  has  finite  depth  of  view. 
Not  all  values  of  2:  are  allowed  in  a  real  camera.  This  can  be  modeled  by  a  radially  truncated 
Gaussian:  pz{z)  =  As  an  example,  we  take  /x  =  5,  cr^  =  4  and  A:  =  3.  Note  that  the 

variance  is  large,  reflecting  a  possible  situation  of  ignorance  with  respect  to  2r.  The  distribution 
of  2:  is  shown  in  figure  6. a.  The  distribution  of  Xi  is  [10]: 


PXx  (a:i)  = 


Bfxf^ 


This  distribution  is  graphed  in  figure  6.b. 


A  simple  calculation  will  show  that  Xi  e  [35.4438, 195.325].  This  could  have  been  obtained  from 
a  tolerance  set  approach:  if  we  simply  assume  2:  €  [1.5359,8.4641],  then  xi  e  [35.4438, 195.325]. 
But  notice  that  by  using  truncated  distributions,  we  obtain  much  more  information  beyond  the 
bounds:  we  know  where  we  should  invest  time  searching  if  we  have  only  bounded  resources;  we 
know  the  means  and  variances  of  the  quantities. 


Consider  now  the  following  extension  of  the  problem:  suppose  we  have  additional  noise  present 
in  the  imagery  process,  so  that: 

Bf 

X\  —  X2  H - (■  oj. 

z 


where  u  ~  Ao,i,io(0, 1).  How  are  we  to  use  this  valuable  information  if  we  work  only  with  bounds? 
We  show  now  how  to  easily  handle  this  in  the  framework  of  bounded  distributions. 


We  can  easily  obtain  the  mean  and  variance  of  X2  +  {Bf)/z  by  numerical  integration.  If  this 
is  considered  a  burden,  we  can  linearize  expression  (13)  around  the  mean  of  .2: 

Bf  Bf, 

Xif^X2  + - -{Xi  -  fi) 

fl 

and  from  this  we  can  approximate  the  mean  and  variance  of  xi.  With  the  values  previously  out¬ 
lined,  we  obtain  mean  60  and  variance  382.25.  Since  the  standard  deviation  is  much  smaller  than 
the  domain  of  definition,  we  take  iVns.as, 6490.5, i(60, 382.25)  as  an  approximation  for  the  distribution 
of  X2  +  (Bf)fz.  Figure  6.c  shows  the  agreement  between  approximation  and  correct  distribution. 
Now  we  can  proceed  and  combine  this  with  a;;  we  can  actually  use  this  measurement  for  filtering 
purposes  through  algorithms  outlined  in  previous  sections. 

Since  our  approach  is  mainly  approximative,  it  is  interesting  to  compare  its  performance  with 
similar  methods  in  the  literature.  We  already  mentioned  the  inability  of  tolerance  sets  to  capture 
all  the  information  in  the  most  simple  situations.  We  now  analyze  another  common  strategy. 
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Suppose  we  consider  z  as  a  unbounded  Gaussian  variable.  We  approximate  the  distribution  of 
xi  through  a  linearization:  a:i  -  iV(60, 382.25).  How  to  justify  the  disparity  constraint?  We  must 
arbitrarily  set  a  threshold,  based  on  extraneous  heuristic  knowledge,  if  we  want  to  decide  whether 
a  value  of  Xi  is  possible  or  impossible.  The  model  neglects  the  physical  source  of  the  constraint, 
the  camera’s  depth  of  field.  And  finally,  the  model  is  also  an  approximation,  a  bad  one  indeed. 
Figure  6.d  shows  the  roughness  of  the  approximation.  The  figure  depicts  the  result  of  truncating 
the  Gaussian  iV(60, 382.25)  after  2  standard  deviations:  some  possible  values  of  xi  are  discarded, 
many  impossible  values  of  xi  are  included.  It  the  correct  value  of  xi  is  located  between  100  and 
195,  a  failure  will  result.  We  could  try  to  cover  all  possible  values  of  xi,  but  unless  we  set  the 
necLsary  thresholds  in  a  very  conservative  way,  some  possible  values  of  xi  may  be  discarded  as 

impossible. 


6  Conclusion 


The  central  idea  of  this  research  is:  measurements  that  are  bounded  must  be  properly  modeled  in 
order  to  be  consistently  used.  Our  focus  is  not  so  much  is  data  produced  by  selection  mechanisms, 
we  focus  on  data  that  is  fundamentally  bounded.  We  take  this  type  of  data  to  be  fundamental  in 
Robotics  application  such  as  computer  vision  or  object  recognition.  Our  proposal  is  to  use  a  family 
of  statistical  models  (the  truncated  Gaussian  family)  that  captures  measurement  boundedness. 
We  offered  a  comprehensive  analysis  of  estimation  aspects  for  the  truncated  Gaussian:  alprithms 
for  information  handling,  updating  and  filtering.  An  open  problem  is  to  find  good  approximations 
for  summations  of  truncated  Gaussians. 

The  statistical  approach  implied  by  bounded  distributions  is  more  interesting  than  pure  toler¬ 
ance  sets  in  a  variety  of  grounds.  First,  we  can  use  the  powerful  language  of  Probability  theory  to 
draw  conclusions.  Second,  usually  the  mean  and  variance  of  disturbances  are  known  or  can  be  es¬ 
timated.  We  should  use  this  valuable  information  if  available,  as  an  example  illustrates:  Consider 
a  variables  a  G  [-7r/2, 7r/2]  and  a  derived  quantity  b  =  tan(a),  such  that  a  e  [-7r/2, 7r/2].  Without 
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any  statistical  knowledge,  we  only  obtain  that  b  €  (—00,00)!  If  instead  we  take  a 
then  we  can  obtain  6  ~  (1  +  6^)A^o,i,i(^»  1)  where  the  analysis  can  proceed.  Situations 

like  this  can  appear,  for  example,  when  using  triangulation  in  order  to  recover  position.  The 
adaptation  of  tolerance  sets  to  the  language  of  Probability  theory  is  an  area  in  which  our  models 
can  be  applied. 

Another  possible  application  of  our  results  is  to  use  a  Gaussian  model  for  unbounded  data, 
but  define  selection  mechanisms  on  the  data.  We  discussed  this  strategy  in  section  3.2;  this  is 
a  common  problem  in  Statistics.  Our  methods  have  the  advantage  that,  although  approximate, 
they  never  assign  probability  zero  to  a  possible  value  of  the  underlying  unknowns.  Algorithms 
here  developed  can  find  application  in  any  of  the  usual  problems  of  measurement  error  analysis, 
when  measurements  are  bounded  and  when  measurements  are  selected.  In  particular,  the  correct 
development  of  a  Bayesian  analysis  of  validation  gates,  as  used  by  Bar-Shalom  and  Fortmann  or 
Cox,  is  an  imediate  example  of  application. 
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A  Proofs  of  Theorems 


Theorem  1  Given  a  expected  value  is  p,  a  covariance  matrix  Q  and  the  fact  that  a  distribution 
is  zero  outside  the  set  {x  :  (x  -  p)'^Q~^{x  -  p)  <  k},  a  maximum  entropy  distribution  that  obeys 
these  conditions  is  a  truncated  Gaussian  N^^cQ,k'{P;cQ),  where  c=  Pr{x\  <  ^)(-P’'(Xn+2  ^  k))  ^ 

and  y  =  kPr{xl+2  <  k){Pr{xl  <  k))'^ ■ 

Proof:  It  is  known  that  the  distribution  which  is  positive  in  a  region  C  and  which  maximizes 
entropy  for  given  mean  and  covariance  matrix  is  of  the  form  [10]: 

p{x)  =  Aoexp  (-Aix  -  (x  -  p)'^A2^{x  -  //)) 

for  x  E  C  and  zero  otherwise.  Ao,  Ai  and  A2  are  respectively  a  scalar,  a  vector  and  a  matrix. 

The  key  to  this  optimization  is  to  note  that  iwlogw  is  convex  and  the  constraints  are  linear, 
so  the  solutions  for  Ao,  Ai  and  A2,  if  they  exist,  are  unique  and  optimal.  By  straighforward 
substitution,  we  notice  that  if  p{x)  =  N^^^cQ,k'{|^■>(^Q)  mean  is  p,  the  variance  is  Q  [9]  and 

the  defining  ellipsoid  is  {(x  —  f^)Q  —  /^)  —  k}-  So  the  truncated  Gaussian  is  the  maximum 
entropy  solution.  □ 


Theorem  2  The  moment  generating  function  of  a  truncated  Gaussian  iVu-,D,fc(0, 1)  is: 

,,  ,  q{u-t,D,k)  (ft\ 


(14) 


Proof: 

<f>{t) 


(q(w,DM-'  I  exp(-lz^z  +  fz)dz 

yJ{2'KY  '^(Er=i <fc)  2  / 

\f¥^  \  2 

{q{u},D,k))-^ 


— 1  /  exp  f— -(z  —  —  o')  dz 


't'^r 


■  exp 


^(27r)"  V  2  / 

(9(0;,  D,  k))~^  exp  j  q{u  - 1,  D,  k) 


exp  dw 


□ 


Theorem  4  //x  ~  Nu,M,k{t^,P)  f^nd  y  =  Ax  (where  A  is  any  non-singular  square  matrix)  then 
y  ~  NAiz,AMAT,k{^fJ',^P ^^)- 
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Proof:  The  distribution  of  y  is  given  by  pY{y)  =  this  gives: 

(  \  _  1  q{p,  P,  M,  k) 

-  |det(A)ll(27r)Met(P)|l 

exp  ^-^(y  -  AfJ,)'^A~'^P~'‘^A~^{y  -  Ap)^  I{{y-Ai^)TA-TM-^A-^(y-Au)<k}{y) 
q{p,  P,  u,  M,  k)  ^ 

|(27r)Met(^)det(M)det(>l)|2 

exp  (-^(y  -  Ap)'^{APA'^)~\y  -  Ap)^  I{(y-A^)'r(AMA'r)-Hy-Ai^)<k}iy) 

=  ^Au,AMAT,k(Ap,AMA'^) 

□ 

Theorem  5  The  distribution  of  z  has  expected  value  Ji^  =  +  and  covariance  P^  =  P^  +  Pj,; 

the  distribution  of  z  is  positive  only  inside  the  ellipsoid  defined  by  {z  :  (z  —  Vz)'^Mf^{z  —  vf)  <  1} 
where 

Uz  =  Vx-\-  Vy  Mz  =  kxMx  +  kyMy 

Proof:  Given  the  independence  of  x  and  y,  the  expressions  for  and  Pz  are  straightforward. 
It  is  necessary  to  prove  that  p{z)  is  positive  in  an  ellipsoid  {z  :  {z  -  pz)^Mf'^{z  -  pz)  <  fc}.  For 
this,  apply  a  linear  transformation  to  x  and  y  so  that: 

x'  =  $^(\/A  V^)(a:  -  px) 

y'  =  $^(\/A  V^)(y  -py) 

where:  A  is  the  eigenvalue  matrix  of  Mj;,  V  is  the  corresponding  eigenvector  matrix  of  My  and  $ 
is  the  eigenvector  matrix  of  (\/A~V^)Mj,(\/A~V^).  As  a  result  of  this  transformation: 

x'  ~  No,i,kziO,I) 

y'  ~  No,L,ky{^,L) 

where  I  is  an  identity  matrix  of  appropriate  dimension  and  T  is  a  diagonal  matrix  of  appropriate 
dimension.  The  distribution  oiu  =  x'  -\-y'  is  given  by  the  multi-dimensional  convolution  between 
x'  and  y'.  So  the  distribution  of  u  is  positive  in  a  set  obtained  by  sliding  a  spheroid  along  an 
ellipsoid.  In  other  words,  u  is  positive  inside  an  ellipsoid  defined  by  {u  :  vF{kxI  -h  kyL)~^u  <  1. 
In  order  to  translate  this  into  a  constraint  about  z,  we  use  two  facts: 

u  =  V^))  {x  +  y-^.-i,y)=  V^))  (z-y.)  (15) 
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and 


=  {kxMx  +  kyMy)  (16) 


in  order  to  obtain: 

{u  :  u^{kxl  +  kyL)~^u  <  1}  = 

{z  :  V^)(2  -  {kj  +  kyiy^  V^)(0  -  <  1}  = 

{z  :  (z  -  pL^Yik^Mx  +  kyMy)~^{z  -  <  1}  = 

{z\{z-  fj.:,)'^M~^{z  -  <  1} 

□ 


Theorem  6  For  +  fJ^y,  F  My,  k^  =  k^  =  ky,  sup^  |G^  -  N^^^M^,k{^^z■,M^)\  is 

0(exp(— A:/2)). 

Proof;  It  is  enought  to  prove  that,  for  a  given  ko,  there  exists  a  constant  a  (which  depends 
only  on  n)  such  that 

dsup^  \G,  -  ^  ^exp(-fc/2)  ^  ^  ^ 

dk  2 


We  start  with  the  same  double  diagonalization  used  in  the  previous  theorem.  Consider  the 
vector  [u,vY  given  by: 


u  =  x'  y' 


V  =  y 

The  distribution  of  u  is  given  by  the  multi-dimensional  convolution  between  x'  and  y': 

Pu{'^)  =  I  Px>{u  —  v)pYi{v)dv 

JW,L 


where  Wu  =  {u  :  (u  —  u)^(u  —  v)  <  k}  f]  {v  :  v^L  <  k}.  Using  the  distributions  of  x'  and  y' 
we  obtain: 


pyipa)  =  f  - - -  exp  ("— i  ((u  — 'y)^(u  —  u) -f  du 

■fwu  |(27r)2”det(L)|2  \  2  '  V 


(17) 


Expression  (17)  can  be  simplified  by  noting  two  facts: 
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Call  li  an  eigenvalue  of  L  {L  is  diagonal): 

li 


det(L)  =  n  = 11(1 + - 11(1 + /.)  n(i + i/^i)  ^  =  det(/ +  I-)det(7  + 

i  i  ^  '  *  i  i 

•  Consider  the  quadratic  form  {u  -  v)'^{u  —  u)  +  v'^L~^v;  it  is  a  summation  of  terms: 

„?  7,2  7,2 


_  1  +  _ 2u-v — —  +  v^ 

~  l  +  h  k  W+hY  ‘  ‘1+/.- 

'^1  1  +  /  n  ,  1  /;  \  n2 

~  l  +  h  h  ■ 


dv 


Applying  these  results,  expression  (17)  can  be  written  as: 
puiu)  =  I(u) 

|(27r)"det(/ +  ii)l2  ^  ^  7 

where 

=  /  un  '  .  .»!r,  n  n|i  (-5(’’  -  (^  + 

|(27r)"'det((/ +  7i“^)“^)|2  x  2  / 

so  u  ~  /(u)A^fc(O,/  +  X).  By  using  expressions  (15)  and  (16),  we  transform  pui^)  into  p{z): 

(-5<^  -  -  "’O 

where  Wu  is  transformed  into  Wz  by  straightforward  substitution  and 

(  ^  ^  |(27r)”det((/ +  7i“^)”^)|2 

exp  Al  (»-(/  +  L-')  ($^x/A‘ V’’)  (z  -  i,.)y  (/  +  X-’) 

(e  -  (/  +  i-‘)  («^\/r  V^)  {z  -  p.)))  <Je. 

Observe  that  I(z)  is  equal  to  q{k)~^  times  the  integral  of  a  Gaussian  in  a  region  of  (so  this 
integral  is  strictly  less  than  1).  We  can  state  I(z)  <  q[k)~^.  So  we  can  write: 

sup|G^  - A^^,,M.,fc^(/^^,M^)|  =  sup  \I(z)Nk{pz,  M^)  -  Nk{pz,M^)\  < 

(  fn-'  V  (  9(^)"H9(^)"^  - 1) 
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Consider  the  following  function  of  k: 


g(k)  =  q{kr'{q(k)-'  -  1) 


Then; 


dgjk) 

dk 


’2 

exp  1 

[-1) 

,1 

3 

1  _ 

2"/2r(n/2) 

and  since  the  term  in  brackets  is  larger  than  one: 


dg{k) 


< 


kn/2-l  gxp  /jn/2  1 

/r«i-t  /  //^\ 


dk  '  2V2r(n/2)  '  2"/2r(n/2) 

for  all  k  >  ko.  By  collecting  constants  in  a,  the  result  is  proven.  □ 


Claims  in  Appendix  3 
^,•4.1  is  given  by  equations  (9). 

Call  Vi+i  the  square  of  the  volume  of  the  ellipsoid  0,+i.  Since  ©i+i  always  contains  the  desired 
intersections  for  any  value  qi+i,  we  are  interested  in  the  smallest  possible  0j+i.  The  value  of  qi+i 
is  then  chosen  so  that  K+i  is  minimal. 

We  have  K+i  =  c„6-'^idet(P,+i),  where  c„  is  the  volume  of  a  n-dimensional  spheroid.  Some 
algebraic  manipulations  show  that  in  order  to  minimize  K+i  we  must  minimize 

bj  -t~  qi+i  ~  o-qi+iivi  H~  ogi+iBiPiB^  ) 

1  +  aqi^\BiPiBj 

with  respect  to 


Claims  in  Section  B 


1.  Equation  (19)  has  a  unique  solution  if  0  <  <  l/(n  +  2). 

Proof:  Call  h  =  ccr^;  there  is  a  one-to-one  correspondence  between  c  and  h.  So  we  can 
study  the  following  equation: 


Pr(xn+2  <  1/fe)  ^  2 

Pr{xl  <  l//i) 


We  analyze  the  function  g{h) 


”  Pr(xl<ylh)  ■ 


Notice  that; 
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•  limh^o  g{h)  =  0. 

•  lim/i_^oo  gih)  =  This  is  obtained  by  using  the  following  expansion  of  the  chi-square 

probability:  Pr(xl  <  x)  =  ESo  r(i/2+i+i)- 

•  g{h)  is  equal  to: 

which  is  increasing  with  h  (variance  increases  with  h). 

So  g{h)  is  increasing  from  0  to  l/(n  -f  2).  Any  value  of  cP  strictly  between  these  limits  will 
produce  a  solution  for  d  and  hence  for  c.  □ 

2.  Integral  fxTx<i  xx'^p4{x)dx  is  equal  to  expression  (23). 

Proof:  We  have  four  terms  to  integrate: 

•  fx^x<i  xx'^lp4(x)dx  is  equal  to  cr'^I. 

•  fx'^x<i  xx'^ (T'^tv(L)p4(x)dx  is  equal  to  <THr{L)L 

•  Ix^xKi  xx^^xp4(x)dx  is  zero. 

•  fxXx<i  xx^{x'^ Lx)p4{x)dx  is  a  diagonal  matrix.  Consider  element  i  in  the  diagonal 

of  this  matrix.  It  is  a  summation  of  one  term  La  JxTx<i  xjp4{x)dx,  and  n  —  1  terms 
Ljj  x]xjp4{x)dx.  The  fourth  moment  of  any  x,-  is  6(7^7  and  the  second  crossed 

moments  of  xfxj  are  2(7^7.  So  the  matrix  is  equal  to  -1-  2<7^7tr(X)/. 

Putting  all  these  terms  together,  we  obtain  expression  (23).  □ 


B  Approximations  of  Bounded  Distributions  given  First 
and  Second  Moments 


One  may  want  to  approximate  a  distribution  with  bounded  support  by  a  truncated  Gaussian. 
For  example,  it  may  be  necessary  to  obtain  an  approximation  for  summations  of  truncated  Gaus- 
sians.  A  natural  approximation  is  to  pick  a  truncated  Gaussian  that  matches  the  first  and  second 
moments  of  the  original  distribution. 

In  this  section,  we  assume  that  the  distribution  to  be  approximated  is  positive  in  the  spheroid 
{x  :  x^x  <1}  and  has  a  diagonal  second  moment  matrix  Q.  This  may  be  assumed  without  loss 
of  generality  if  the  original  distribution  is  positive  in  an  ellipsoid  (since  we  can  perform  a  double 
diagonalization).  Approximations  are  valid  if  no  eigenvalue  of  Q  exceeds  l/(n  +  2),  where  n  is  the 
dimension  of  the  space. 
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B.l  Equal  Variances  Case 


In  this  section  we  consider  a  distribution  fxix)  with  Q  =  and  mean  //.  Take  an  initial 
approximation  of  the  form: 


Pi(a:) 


1 

q^J(2^rccr‘^)”■ 


exp 


(18) 


where  c  is  the  solution  of  the  equation: 


Pr {xl+2  <  l/(g^^))  ^ 
Prixl  <  l/(c^^)) 


(19) 


This  equation  always  has  a  unique  solution  if  0  <  cr^  <  l/(n  +  2),  as  proved  in  Appendix  A.  The 
limiting  value  l/(n  +  2)  corresponds  to  the  second  moment  of  a  constant  distribution. 


The  variance  of  X  under  the  approximation  pi(a;)  is  exactly  cr^/.  But  the  mean  is  zero.  We 
adjust  the  mean  through  a  linear  term: 


P2{x) 


1 

q^{2'KC(j'^Y 


exp 


(®)' 


(20) 


The  function  P2{x)  is  not  a  distribution,  since  it  may  have  negative  values.  So  we  use  another 
approximation:  (1  +  (//^/cr^)x)  k.  exp(//^/cr^x),  in  order  to  obtain: 


P3{x)  = 


q'<^{2'Kca'^Y 


exp 


x^x  —  2cp^x  +  (?pr  p 

2c(t2 


(21) 


Essentially,  we  are  constructing  a  Edgeworth-like  expansion  [3]  for  fx(x)  and  then  approximat¬ 
ing  it  by  an  exponential.  If  we  have  higher  moments  we  can  increase  the  number  of  terms  in  the 
approximation  by  building  a  sequence  of  orthogonal  polynomials  (with  norm  based  on  exp(-x^)), 
as  detailed  by  Cramer  [3]. 


Example  4  Consider  the  distribution  f{x)  =  ((1  —  x)f2)Ix2<i{x),  with  mean  -1/3  and  variance 
2/9.  This  distribution  is  highly  skewed.  We  obtain  c  =  1.51696  and: 

,  .  _ 1 _  (  (x  +  (c/3))^\  .  . 

~  1.16599y27r(2/9)c^^^  \  2(2/9)c  J  ^  ^ 

Figure  7  shows  the  graphs  of  f{x)  andp^ix).  □ 
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Figure  7:  Distribution  f(x)  and  the  approximation  ^3(0;). 

B.2  General  Case 

We  relax  the  assumptions  of  last  section,  taking  Q  to  be  diagonal.  Take  function  pi(')  in  expression 
(18)  as  an  initial  approximation.  There  is  freedom  in  the  choice  of  cr^.  We  choose  to  be  the 
largest  eigenvalue  of  Q\  the  constant  c  is  the  solution  of  equation  (19).  Approximation  pi(-)  does 
not  produce  the  correct  mean  and  variance.  First  we  adjust  the  mean  through  a  linear  term 
obtaining  function  p2{-)  as  in  expression  (20).  Approximation  p2{-)  is  flatter  than  the  original 
distribution.  We  can  stop  here  and  obtain  p3(-)  as  in  expression  (21). 

Alternatively,  we  can  improve  the  approximation  by  adding  more  terms  to  P2(-)-  Pick  a 
quadratic  term: 

L  is  a  diagonal  matrix  that  produces  the  correct  second  moment  for  P4(-)-  Since  p2(-)  is  flatter 
than  the  original  distribution,  we  deflne  L  as  a  diagonal  negative  definite  matrix. 

Function  P4(x)  may  not  be  a  distribution.  So  we  use  another  approximation: 

(1  -  <7^1(1)  +  +  x^Zx'j  «  exp  (/'  +  .  (22) 

where  /'  must  be  obtained  numerically  by  solving  l'exp(l')  =  1  —  cr^tr(L).  Solution  is  always 
possible  since  L  is  defined  to  be  negative  definite.  The  final  approximation  is  the  product  of  pi(x) 
and  the  approximation  in  expression  (22). 

We  now  indicate  how  to  obtain  L.  The  value  of  /a;Tj:<i  xx^ p4(x)dx  is  (Appendix  A): 

+  (27cr^  —  a^)ix{L)I  +  cr^  I  (23) 
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-V 


where  7  =  c^-P»'(Xn+4  —  (V(^‘^^)))(-^^(Xn  ^  ^  vector  of  diagonal  elements 

of  Q  and  B  the  vector  of  diagonal  elements  of  L]  then  (1  is  a  vector  of  ones): 


47/  /  47<t‘‘ 


Closed  form  solution  of  this  equation  is: 


B  = 


27-1 

(4  +  2n)7  —  n 


A-a^l 

47cr‘‘ 


(24) 


From  B  we  can  obtain  L.  These  formal  manipulations  do  not  guarantee  that  the  resulting  L 
matrix  is  negative  definite,  as  required.  Notice  that  {A  -  a^l)l{4'ya'^)  is  a  vector  of  non-positive 
numbers.  From  the  geometry  of  expression  (24)  it  is  clear  that  all  eigenvalues  of  Q  must  lie  in  a 
wedge  in  the  first  quadrant.  If  this  fails  to  happen,  we  can  increase  cr^.  It  this  fails,  we  can  set 
the  negative  elements  of  L  to  zero.  Albeit  crude,  the  strategy  will  always  improve  upon 


» 


« 
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